Deep learning is a great approach to deal with unstructured data such as text, sound, video and image. There are a lot of implementations of deep learning in image classification and image detection, such as classifying image of dog or cats, detecting different objects in an image or do facial recognition.
knitr::include_graphics("https://cis-8392-assignment-3.s3.amazonaws.com/intro_image.gif")
On this article, we will try to build a simple image classification that will classify whether the presented image is an airplane, car, cat, dog, flower, fruit, motorbike or a person.
This dataset contains 6,899 images from 8 distinct classes compiled from various sources. The classes include airplane, car, cat, dog, flower, fruit, motorbike and person. The dataset is available at Kaggle.
The link to the dataset is https://www.kaggle.com/datasets/prasunroy/natural-images?datasetId=42780&language=null
I have created an S3 storage bucket in AWS Cloud and hosted my screenshot image in the bucket along with giving public access to the object so that everyone with the url can have access to view the image.
url <- "https://cis-8392-assignment-3.s3.amazonaws.com/Screenshot+of+the+Code+section+of+the+Kaggle+dataset+filtered+with+R.PNG"
knitr::include_url(url)
library(keras)
library(tensorflow)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.3 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(imager)
## Warning: package 'imager' was built under R version 4.1.3
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
##
## add
## The following object is masked from 'package:stringr':
##
## boundary
## The following object is masked from 'package:tidyr':
##
## fill
## The following objects are masked from 'package:stats':
##
## convolve, spectrum
## The following object is masked from 'package:graphics':
##
## frame
## The following object is masked from 'package:base':
##
## save.image
library(caret)
## Warning: package 'caret' was built under R version 4.1.3
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
## The following object is masked from 'package:tensorflow':
##
## train
library(grid)
##
## Attaching package: 'grid'
## The following object is masked from 'package:imager':
##
## depth
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
Let’s explore the data first before building the model. In image classification problem, it is a common practice to put each image on separate folders based on the target class/labels. For example, inside the train folder in our data, you can that we have 7 different folders, respectively for airplane, car, cat, dog, flower, fruit, motorbike, person.
## [1] "airplane" "car" "cat" "dog" "flower" "fruit"
## [7] "motorbike" "person"
## [[1]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3
##
## [[2]]
## Image. Width: 188 pix Height: 121 pix Depth: 1 Colour channels: 3
##
## [[3]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3
##
## [[4]]
## Image. Width: 114 pix Height: 111 pix Depth: 1 Colour channels: 3
##
## [[5]]
## Image. Width: 416 pix Height: 251 pix Depth: 1 Colour channels: 3
##
## [[6]]
## Image. Width: 348 pix Height: 429 pix Depth: 1 Colour channels: 3
##
## [[7]]
## Image. Width: 327 pix Height: 203 pix Depth: 1 Colour channels: 3
##
## [[8]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3
##
## [[9]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3
##
## [[10]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3
##
## [[11]]
## Image. Width: 125 pix Height: 163 pix Depth: 1 Colour channels: 3
##
## [[12]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3
file_name <- map(folder_path, function(x) paste0(x, list.files(x))
) %>% unlist()
head(file_name)
## [1] "natural_images_small/train/airplane/airplane_0001.jpg"
## [2] "natural_images_small/train/airplane/airplane_0002.jpg"
## [3] "natural_images_small/train/airplane/airplane_0003.jpg"
## [4] "natural_images_small/train/airplane/airplane_0004.jpg"
## [5] "natural_images_small/train/airplane/airplane_0005.jpg"
## [6] "natural_images_small/train/airplane/airplane_0006.jpg"
tail(file_name)
## [1] "natural_images_small/train/person/person_0555.jpg"
## [2] "natural_images_small/train/person/person_0556.jpg"
## [3] "natural_images_small/train/person/person_0557.jpg"
## [4] "natural_images_small/train/person/person_0558.jpg"
## [5] "natural_images_small/train/person/person_0559.jpg"
## [6] "natural_images_small/train/person/person_0560.jpg"
length(file_name)
## [1] 4480
One of important aspects of image classification is understand the dimension of the input images. You need to know the distribution of the image dimension to create a proper input dimension for building the deep learning model. Let’s check the properties of the first image.
# Full Image Description
img <- load.image(file_name[1])
img
## Image. Width: 286 pix Height: 113 pix Depth: 1 Colour channels: 3
You can get the information about the dimension of the image. The height and width represent the height and width of the image in pixels. The color channel represent if the color is in grayscale format (color channels = 1) or is in RGB format (color channels = 3). To get the value of each dimension, we can use the dim() function. It will return the height, width, depth, and the channels.
# Image Dimension
dim(img)
## [1] 286 113 1 3
So we have successfully inserted an image and got the image dimensions. On the following code, we will create a function that will instantly get the height and width of an image.
# Function for acquiring width and height of an image
get_dim <- function(x){
img <- load.image(x)
df_img <- data.frame(height = height(img),
width = width(img),
filename = x
)
return(df_img)
}
get_dim(file_name[1])
## height width filename
## 1 113 286 natural_images_small/train/airplane/airplane_0001.jpg
Now we will sampling 1,000 images from the file name and get the height and width of the image. We use sampling here because it will take a quite long time to load all images.
# Randomly get 1000 sample images
set.seed(123)
sample_file <- sample(file_name, 1000)
# Run the get_dim() function for each image
file_dim <- map_df(sample_file, get_dim)
head(file_dim, 10)
## height width filename
## 1 237 261 natural_images_small/train/flower/flower_0223.jpg
## 2 281 328 natural_images_small/train/flower/flower_0271.jpg
## 3 319 358 natural_images_small/train/dog/dog_0547.jpg
## 4 85 287 natural_images_small/train/airplane/airplane_0526.jpg
## 5 256 256 natural_images_small/train/person/person_0371.jpg
## 6 100 100 natural_images_small/train/fruit/fruit_0186.jpg
## 7 308 238 natural_images_small/train/dog/dog_0162.jpg
## 8 372 458 natural_images_small/train/cat/cat_0022.jpg
## 9 122 201 natural_images_small/train/motorbike/motorbike_0011.jpg
## 10 100 201 natural_images_small/train/motorbike/motorbike_0086.jpg
Now let’s get the statistics for the image dimensions.
summary(file_dim)
## height width filename
## Min. : 54.0 Min. : 67.0 Length:1000
## 1st Qu.: 100.0 1st Qu.: 100.0 Class :character
## Median : 125.0 Median : 237.5 Mode :character
## Mean : 201.4 Mean : 240.9
## 3rd Qu.: 256.0 3rd Qu.: 297.0
## Max. :2665.0 Max. :2737.0
The image data has a great variation in the dimension. Some images have less than 55 pixels in height and width while others have up to 2737 pixels. Understanding the dimension of the image will help us on the next part of the process: data preprocessing.
The images are rescaled and normalized the pixel values to range from 0 to 1.
In this way, the numbers will be small and the computation becomes easier and faster.
As the pixel values range from 0 to 256, apart from 0 the range is 255. So dividing all the values by 255 will convert it to range from 0 to 1.
train_datagen <- image_data_generator(rescale = 1/255)
validation_datagen <- image_data_generator(rescale = 1/255)
test_datagen <- image_data_generator(rescale = 1/255)
Data preprocessing for image is pretty simple and can be done in a single step in the following section.
Since we have a good of training set, we don’t need to build artificial data using method called Data Augmentation.
Data augmentation is one useful technique in building models that can increase the size of the training set without acquiring new images. But, here we are not using Data Augmentation as we have enough data for building and training the deep learning convnet model.
Here we are reading the images and converting them to tensors while rescaling the pixel values to [0,1] interval.
Now we can insert our image data into the generator using the flow_images_from_directory(). The data is located inside the natural_images_small folder and inside the train folder, so the directory will be natural_images_small/train. From this process, we will get the image and we can do this on both training data and the validation data.
train_generator <- flow_images_from_directory(
"natural_images_small/train", # Target directory
train_datagen, # Training data generator
target_size = c(150, 150), # Resizes all images to 150 × 150
batch_size = 20, # 20 samples in one batch
class_mode = "categorical" # Because we use categorical_crossentropy loss,
# we need categorical labels.
)
validation_generator <- flow_images_from_directory(
"natural_images_small/validation",
validation_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "categorical"
)
test_generator <- flow_images_from_directory(
"natural_images_small/train",
test_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "categorical"
)
Convnet model has been built for the deep learning classification.
relu activation functionsoftmax activation functionModel is built already in R script and saved so that it can be loaded again.
model_file = "natural_images_model.h5"
history_file = "natural_images_fit_history.rds"
model_v2 <- load_model_hdf5(model_file)
history_v2 <- read_rds(history_file)
Plotting the loss and accuracy for training and validation data.
#Plotting the accuracy and loss for training and validation data
plot(history_v2)
## `geom_smooth()` using formula 'y ~ x'
#Evaluating the model on test data
model_v2 %>%
evaluate_generator(test_generator, steps = 50)
## $loss
## [1] 0.06363041
##
## $acc
## [1] 0.982
The accuracy on the test data on the model is around 98%.
1. I have built the convnet model for predicting 8 classes from the natural images dataset.
2. From the visulaization of loss and accuracy with regards to epoch , I observed that accurary of the model will improve with more epochs .
3. I could observe over fitting issue based on the loss and epoch variation, and by using data augmentation technique we can overcome the overfitting.
4. With around 30 epochs, the model gave an accuracy of around 98% on the testing data.